Enhancing MPI+OpenMP Task Based Applications for Heterogeneous Architectures with GPU Support

نویسندگان

چکیده

Heterogeneous supercomputers are widespread over HPC systems and programming efficient applications on these architectures is a challenge. Task-based models promising way to tackle this Since OpenMP 4.0 4.5, the target directives enable offload pieces of code GPUs express it as tasks with dependencies. Therefore, heterogeneous machines can be programmed using MPI+OpenMP(task+target) exhibit very high level concurrent asynchronous operations for which data transfers, kernel executions, communications CPU computations overlapped. Hence, possible suspend performing CPUs overlap their completion another task execution. Suspended resume once associated event completed in an opportunistic at every scheduling point. We have integrated feature into MPC framework validated AXPY microbenchmark evaluated MPI+OpenMP(tasks) implementation LULESH proxy applications. The results show that we able improve asynchronism overall performance, allowing benefit from execution machines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic OpenCL Task Adaptation for Heterogeneous Architectures

OpenCL defines a common parallel programming language for all devices, although writing tasks adapted to the devices, managing communication and load-balancing issues are left to the programmer. In this work, we propose a novel automatic compiler and runtime technique to execute single OpenCL kernels on heterogeneous multi-device architectures. The technique proposed is completely transparent t...

متن کامل

Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures

We present a new methodology for utilizing all CPU cores and all GPUs on a heterogeneous multicore and multi-GPU system to support matrix computations efficiently. Our approach is able to achieve four objectives: a high degree of parallelism, minimized synchronization, minimized communication, and load balancing. Our main idea is to treat the heterogeneous system as a distributed-memory machine...

متن کامل

Compiling Stream Applications for Heterogeneous Architectures

Compiling Stream Applications for Heterogeneous Architectures by Amir H. Hormati

متن کامل

Hardware support for Local Memory Transactions on GPU Architectures

Graphics Processing Units (GPUs) are popular hardware accelerators for data-parallel applications, enabling the execution of thousands of threads in a Single Instruction Multiple Thread (SIMT) fashion. However, the SIMT execution model is not efficient when code includes critical sections to protect the access to data shared by the running threads. In addition, GPUs offer two shared spaces to t...

متن کامل

Distributed learning of CNNs on heterogeneous CPU/GPU architectures

Convolutional Neural Networks (CNNs) have shown to be powerful classification tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times—the computational complex part—that not even the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-15922-0_1